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Abstract. Numerical aspects are investigated in ultra-large-scale electronic structure 
calculation. Accuracy control methods in process (molecular-dynamics) calculation are 
focused. Flexible control methods are proposed so as to control variational freedoms, 
automatically at each time step, within the framework of generalized Wannier state 
theory. The method is demonstrated in silicon cleavage simulation with 10 2 -10 5 atoms. 
The idea is of general importance among process calculations and is also used in Krylov 
subspace theory, another large-scale-calculation theory. 
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1. Introduction 

Nowadays one of most important scientific fields is process of nanostructure, structure 
in nanometer- or ten-nanometer scales, particularly, for controllability of its structure 
and function. Electronic structure calculation in these purposes should be carried 
out with a large system (10 3 atoms or more) and a meaningful timescale. For a 
decade, on the other hand, many calculation methods and related techniques have 
been proposed so as to handle such large systems. See reviews [U [2] and original 

works, [si si isi isi cn isi isi noi nn n^i ubi nai nsi nsi iizi eibi nsi 1201 inii 1221 in these 

methodologies, one-body density matrix or Green's function is calculated, instead of 
one-electron eigenstates and the calculation is carried out with real-space representation. 
A physical quantity (X) is given as a trace form 

(X)=Tr[pX} = J J drdr'p(r,r')X(r',r). (1) 

with the density matrix p. If the matrix X(r, r') is of short range, the off-diagonal long- 
range component of the density matrix does not contribute to the physical quantity 
(X), which is crucial for practical success of large-scale calculations. [7] 




Number of atoms (N) Number of processors 

Figure 1. (color online) Computational time of the ultra-large-scale calculation with 
up to 11,315,021 atoms. The time of our methods are plotted for fee Cu, liquid C and 
bulk Si. Optimal solver method is chosen for each system; [5T] Wannier state theory in 
perturbative procedure is chosen for bulk silicon and Krylov subspace theory is chosen 
for other cases, (a) Comparison among materials. As a reference data, the time of the 
conventional eigenstate calculation is also plotted for fee Cu with single CPU. See Refs. 
P~5l[l71[2T] for details. The computations were carried out using Intel or SGI CPUs. 
In parallel computation, the number of processors (CPU cores) is indicated inside the 
parenthesis, (b) The time of bulk silicon with 1,423,909 atoms and 11,315,021 atoms, 
is measured as a function of processors using SGI origin 3800™. 

As ones of these works, we have developed a set of theories and program 
codes and applied them to silicon, carbon and metal systems with Slater-Koster-form 
Hamiltonians. [T5], [T6J, [T7J [TU (20J, [21] These theories are constructed from fundamental 
theory of generalized Wannier state or Krylov subspace. An overview is given in Ref. [21J . 
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Figure [T] demonstrates our methods with or without parallel computers, in which the 
computational cost is 'order- N' or linearly proportional to the system size (N), up to 
ten-million atoms and shows a satisfactory performance in parallel computation. We 
note that the electronic property, such as density of states, is also calculated. [T6l [T9l |2T] 
These large-scale-calculation methods have controlling parameters for calculating 
electronic freedoms, which gives accuracy and computational cost. In the present 
paper, we will introduce flexible methods of controlling electronic freedoms for optimal 
computational cost, and will be demonstrated within the framework of generalized 
Wannier state theory. The methods are crucial, particularly, in a dynamical process 
or a molecular dynamics (MD) calculation. This paper is organized as follows; An 
overview of theory and example of silicon cleavage are given in Sec [2j Then the flexible 
control methods are introduced and demonstrated in Sec. We point out that similar 
flexible control methods are used in Krylov subspace theory. Section H] is devoted to 
concluding remarks. 

2. Theoretical overview and examples 

The calculation in this paper was carried out in the theoretical framework of generalized 
Wannier state. [23l [Ml O O [251 HOl [26l [271 12HI [29] A physical picture of the generalized 
Wannier states {(fif^^} is localized chemical wave function in condensed matters, such 
as a bonding orbital or a lone-pair orbital with a slight spatial extension or 'tail'. The 
suffix % of a wavefunction 0^ WS ' ) indicates the position of its localization center, such as 
bond site. Their wavefunctions {(j)^^} are equivalent to the unitary transformation of 
occupied eigen states and the density matrix is given as 



where wavefunctions are described as real number. The Wannier state theory is 
suitable for large systems, particularly, when a dominant number of wavefunctions are 
well localized. The present calculations were carried out by a variational procedure 



Hereafter silicon cleavage process is calculated with a transferable Hamiltonian 
in the Slater-Koster form [30J. Nanometer-scale or ten-nano- meter-scale samples are 
cleaved under external load. Figure [2] shows examples of the resultant cleaved samples 
that contain experimentally observed cleavage planes, (111) and (110) planes; In Fig, 
EJ^a), the resultant sample contains a cleaved Si(lll)-2xl surface. [17] A pair of five- 
and seven-membered rings appears in the cleavage propagation direction, [211] direction, 
which forms the unit cell of the 2x1 structure called Pandey structure. [3T], [32], [33] As an 
interesting feature of the present result, the cleaved surface contains a step structure with 
a six-membered ring at the step edge, which is compared to experiments. [T7J As details, 
the sample consists of 1,112 atoms and the periodic boundary condition is imposed, by 
eight atomic layers, in the direction perpendicular to the cleavage propagation direction. 
In the present case, an additional periodicity, by two atomic layers, is imposed as a 
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Figure 2. (color online) Simulation results of silicon cleavage, (a) A sample with 
1,112 atoms. The resultant cleaved surface shows a (lll)-2xl reconstruction and 
contains a step structure, (b) A sample with 10,368 atoms. The resultant cleaved 
surface shows a buckled (110) reconstruction. 

constraint on the atomic structure. We note that the 2x1 structure appears even without 
the additional periodicity. See papers [17] for more details and results of larger samples 
with 10 5 atoms. In Fig,|2^b), the resultant cleaved surface is a buckled (110) surface that 
appears in textbooks in surface physics or papers such as Refs. [3H [35] . As details, the 
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sample consists of 10,368 atoms and the periodic boundary condition is imposed, by eight 
atomic layers, in the direction perpendicular to the cleavage propagation direction. The 
physical discussions in cleavage dynamics are found in Refs. [T5l [T71 12T] and reference 
therein. 
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Figure 3. (color online) (a) The stress value of nanocrystalline silicon of 91 atoms 
with thermal vibration, in which the sample is stretched by a [001] uniaxial load. The 
calculations were carried out by the Wannier state method with (i) 'constant cutoff' 
(WS-CC), (ii) 'flexible control at the first level' (WS-FC1), and (iii) 'flexible control 
at the second level' (WS-FC2). See text for explanation. The conventional eigen state 
method (Eig) was also carried out as a reference data, (b) The same data set as in (a) 
are plotted but the data of the CC method is ignored, so as to clarify a significantly 
better agreement among the other methods. 



3. Flexible methods for accuracy control 

3.1. Three methods in Wannier state theory 

Here we describe the accuracy control methods [151 \T7\ [29] used in the above MD 
calculation. In generalized Wannier state theory, the region for localization constraint 
for each Wannier state is variational freedoms that governs accuracy and computational 
cost. Therefore we will concentrate the methods of setting the localization region for 
each wavefunction at each time step. 

Here the three methods of accuracy control in the Wannier state calculation are 
proposed. Among all the methods, the localization constraint on each wavefunction 
4 - WS ' ) is imposed as a spherical region whose center is the weighted center of the 
wavefunction r| ws ^ = (0 4 - WS ^|f|0 4 - WS ^). Therefore, the cutoff radius of the spherical 
region, denoted R{ , mainly contributes to accuracy. We also denote iv/ ws ^ as the 
number of atoms inside the localization region of the i-th Wannier state. Three methods 
for determination of the radius are used; (i) 'constant cutoff' method (WS-CC method) 
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(ii) 'flexible control method at the first level' (WS-FC1 method), and (iii) 'flexible 
control method at the second level' (WS-FC2 method). See below for explanation of 
these methods. 

The method is demonstrated in nanocrystalline silicon of an isolated cubic sample 
with 91 atoms. [29J The sample is thermally vibrated with 300 K and an additional slow 
constant- velocity motion is introduced for the atoms on the sample surface. As a result, 
the sample is stretched in the [001] direction with thermal vibration. Figure [3] shows the 
trajectory of calculated stress a. In Figj3](a), the results of three controlling methods 
for Wannier states are compared. Figure [3](a) also contains a result of conventional 
eigen-state calculation as a reference data, in which the temperature (level-broadening) 
parameter of r = 0.1 eV is used for electronic system. 

In the WS-CC method of Fig. [HI the radius is chosen to be a constant value of 
-R(ws) = 2.5d , where do(= 2.35A) is the equilibrium bond length. This value is chosen 
for all Wannier states through the simulation. Without an external load, this radius sets 
the localization region of the Wannier states to about N^ s = 40 atoms. We should say 
that a results with the CC method is expected to be rather poor, because the sample 
in the present MD simulation will be stretched by the external load and the number of 
atoms within the localization region tends to decrease during the MD simulation. This 
point will be confirmed numerically in the last paragraph of the present subsection. 

A better way for accuracy control is to give the number of atoms in the localization 
region, A^ ws , instead of a given radius -R! WS ' ) , which realized a flexible control for the 
localization radius. In this method, the radius R[ WS ^ is chosen so that the localization 
region contains a given number, j\rf ws ' min ' ) , of atoms or more. This method is called 
flexible control method at the first level (WS-FC1 method). In Fig. [3]Ja) we choose 
the value of jvj ws,:min ) = 40 . In results, the localization radius i?| ws ^ may be different 
among the Wannier states and the number of atoms within the localization region (N^ 3 ) 
always satisfies iVj WS) > iy( ws < min ) =40. 

Now we explain the third method for setting the localization region, called flexible 
control method at the second level (WS-FC2 method). In the program code, an iterative 
solution procedure is carried out for an equation of generalized Wannier states. See 
Refs. [lOj EH [29] for the explicit expression of the equation. Since the residual of the 
equation (5 fa) is well defined for each wavefunction </>[ WS ' 1 , the accuracy of a calculated 
wavefuncion can be rigorously monitored by the residual norm \5(f>i\. The residual 
norm vanishes, when the calculated wavefuncion will be exact (\5(f>i\ — ► 0). When 
the wavefunction <pi has a large residual norm \S(pi\, a larger number of atoms (A^ ) 
should be assigned inside the localization region so as to reduce the residual norm \S(j)i\. 
In the present code, the assignment is carried out automatically for each wavefunction at 
each time step. In the calculation with Fig. EJ^a), we classify all wavefunction into three 
classes with different numbers j\rf WS ' mm ' ) ; jvf WS,mm '' = 40,60 or 80. The classification 
procedure is carried out with the averaged value (50 av of the residual norm among all 
wavefunctions {|<5<^i|} • If the residual norm of a wavefunction (|5</>i|) is almost the same 
as its averaged value < 1.2<50 av ), the number of j\rf ws,mm ' ) is set to be the small 
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^(WS.min) = 4Q) _ j f regidual 

norm of a wavefunction (|<50i|) is slightly larger 
than its averaged value (1.2<50 av < \5<pi\ < 1.550 av ), the number of _/v^ WS ' mm ^ is set to be 
the middle one 

^(WS.min) = QQ y j f ^ regidual 

norm of a wavefunction is larger 

than 150 % of its averaged value (1.5<50 av < |5</>i|), the number of jy^CW ( min; ^ g ge ^. ^ Q ^ g 
the large one (ivf ws,mm) = 80). 

When the three control methods with Wannier states, WS-CC, WS-FC1 and WS- 
FC2 methods, are compared, in Fig. [3(a), with a reference data by the conventional 
eigen state method, one finds that the flexible control methods, FC1 and FC2 methods, 
are siginificantly better in accuracy than the CC method. This statement is clarified in 
Fig. [3(b), when the trajectories without that of the CC method show a better agreement. 

3.2. Discussions 

Although the flexible control methods give, in general, a better accuracy during the 
MD simulation than the CC method, any of the three methods, WS-CC, WS-FC1 and 
WS-FC2 methods, is sufficient for discussing physical quantities in the present case; 
For example, the averaged gradient of Fig. [3] is proportional to the Young modulus in 
the [001] direction (£ioo), because the stretching motion is realized within a constant 
velocity. The Young modulus is estimated, commonly among four calculation methods, 
to be Eioo ~ lOOGPa, where the estimated value may include an error on the order of 
10 %. The estimated value is comparable with the experimental bulk value Eioo ~ 130 
GPa but is deviated, owing to the small system size. Satisfactory results are given also 
for critical stress for cleavage; cr c = 2.5 — 3.0 GPa. Moreover the cleavage propagation 
velocity (not shown) agrees well among the three Wannier state calculations and the 
eigen-state calculation. Note that the discussion of these quantities in nano crystalline 
silicon is given in Ref. [15j . 

The WS-FC2 method is required in several simulations and one example is the 
case of Fig. [2(a) or silicon cleavage with Si(lll)-2xl cleaved surface; The elementary 
reconstruction process occurs among several bond sites including surface and subsurface 
layers, p2] and a larger region is required for describing wavefunctions near the cleaved 
surface. Since the number of wavefunctions near the cleaved surface accounts for 
only a small fraction, typically 10 %, of the total number of wavefunctions, the total 
computational cost of the FC2 method is moderate, when compared with the CC 
method. 

Finally we note that a similar flexible control is also used in Krylov-subspace theory, 
another theory for large-scale-calculation theory. See Appendix of Ref. [21 J . 

4. Concluding remarks 

In this paper, we focus the way of accuracy control in dynamical process or MD 
simulation. Flexible control methods are proposed so as to realize large-scale process 
(MD) calculation, in which the electronic freedoms are determined optimally at each 
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time step. 

Since nature (or electronic structure) of physical system can change during a 
dynamical process, flexible control methods proposed here are crucial, generally, among 
large-scale calculations, when one would like to achieve a proper balance between 
accuracy and computational cost. 
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